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by 

Sheldon  Ross 


1.  Introduction 

This  paper  considers  the  following  model:  A  production  process 
produces  Items  at  the  beginning  of  distinct  time  periods  t  ■  0,1,2,.,. 

It  Is  supposed  that  at  any  time  t  the  production  process  may  be  In  any 
one  of  a  countable  number  of  states  0,1,2,...  and  that  the  quality  of 
the  Item  produced  Is  a  function  of  this  underlying  state.  It  Is  also 
supposed  that  the  state  of  the  process  of  time  t  Is  not  known  and  can 
only  be  determined  by  sampling  the  Item  produced.  If  the  process  Is  In 
state  1  then  a  cost  I^  Is  Involved  In  sampling  the  Item.  The  purpose 
of  sampling  Is  not  to  replace  poor  Items  by  good  ones  but  rather  to 
check  the  manufacturing  process. 

Thus  at  the  beginning  of  a  period  one  must  decide  whether  to  Inspect 
the  Item  produced  or  not.  Also  one  may  decide  to  revise  the  process. 

This  might  be  done,  for  Instance,  If  an  Item  had  been  sampled  the 
previous  period  and  had  shown  that  the  production  process  was  in  a  poor 
state.  The  cost  associated  with  revising  a  process  In  state  1  Is 

It  Is  supposed  that  If  the  process  Is  revised  at  the  beginning  of 
period  t  then  It  will  be  In  state  0  at  the  end  of  period  t.  Also  It 
Is  assumed  that  no  Item  Is  produced  during  that  period.  If  the  process 
Is  In  state  1  at  the  beginning  of  period  t  and  Is  not  revised  then  It 
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will  remain  In  state  1  during  the  remainder  of  that  period.  If  a 


It  will  be  In  state  j  at  the  beginning  of  the  next  period. 

If  the  process  In  In  state  1  and  an  Item  Is  produced  without  Inspec¬ 
tion  then  there  Is  a  cost  Incurred.  (The  Inspection  cost  I^  may 
be  thought  of  as  already  Including  a  production  cost).  However  we 
suppose  that  this  cost  does  not  directly  become  known  and  thus 
cannot  be  used  to  Indicate  the  state  of  the  process.  It  Is  assumed 
that  all  costs  and  transition  probabilities  are  known,  and  that  all 
costs  are  bounded. 

In  this  paper  a  framework  Is  provided  for  handling  problems  of 
this  nature.  In  Section  2  a  method  of  Indicating  the  (observed) 

"state"  of  the  system  at  any  time  t  Is  given  and  some  theorems  relating 
to  the  convexity  of  the  optimal  Inspection  and  revision  regions  are 
proven.  In  section  3  a  two-state  production  process  Is  considered 
and  the  structure  of  the  optimal  policy  Is  determined.  An  Interesting 
sidelight  of  this  Is  that  the  optimal  policy  doesn't  necessarily  have 
the  simple  form  which  Intuition  might  lead  one  to  predict.  In 
section  4  we  treat  the  case  where  one  of  the  parameters  of  the  model  Is 
not  fully  known. 

The  general  model  considered  here  Is  similar  to  one  considered  In 
[2].  However,  both  the  methods  employed  and  the  results  obtained  are 
different.  It  should  also  be  mentioned  that  the  above  model  need  not 
be  Interpreted  solely  In  a  quality  control  context  but  may  also  be 
Interpreted  as  a  model  for  machine  deterioration  when  Inspection  Is 
costly. 
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2. 


General  Model 


We  s''.all  say  that  the  system  Is  In  state  P  ■  (Pq,P^,...)  If  with 
probability  P^  the  underlying  process  Is  In  state  1,  1-0,1,...  We 

let  S  -  {P  . ^  P^  ■  1}  denote  the  state  space 

of  the  system.  Thus  we  are  allowing  for  the  possibility  that  at  time 
t  ■  0  we  only  know  the  underlying  state  of  the  production  process  up 
to  some  arbitrary  probability  distribution. 

Let  denote  the  state  of  the  system  at  the  beginning  of 

period  t;  anJ  let  denote  the  action  chosen  at  t-  either  produce 
without  Inspection  (P) ,  produce  with  Inspection  (I,) ,  or  revise  the 
process  (R) . 

Let  I  PjC^  If  «  (Pq . )  and  -  P 

C(X^,Aj.)  -  Z  P^I^  if  X^.  -  (Pq . )  and  A^  -  I 

Z  P^R^  If  X^  -  (Pq . )  and  A^.  -  R 

A  policy  Is  any  (measurable)  rule  for  choosing  actions.  For  any 

00 

policy  R  and  6e(0,l)  let  il)(P,6,R)  •  Z  [C(X  ,A  )  [  X-  »  P],  and 

t-0  ^  ^ 

let  Vo(P)  ■  Inf  <|)(P,8,R)  PeS.  Thus  Vo(P)  is  the  expected  cost  incurred 
®  R  ^ 

when  an  optimal  policy  Is  employed  given  that  the  system  starts  In 

state  P  and  future  costs  ar^  discounted  by  a  factor  B. 

For  any  PeS,  let  TP  -  (  (TP)q,  (TP)  . . . )  where  (TP)^  -  Z  P^Pj^ 

1  ■  0,1,...  and  let  e  ■  (Pj^q»  1  “  0,1,... 
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Then  P  -  TP|X^  -  P,  -  P}  -  1 

p  (Xj+i  -  e*|x^  -  p.  ■  I)  -  Pi  1  ■  0,1 . 

P  -  e°|X^  -  P.  It  -  R)  -  1 

It  Is  well  known  (see  [1])  that  V  (P)  Is  the  unique  solution  to 

P 

(1)  Vg(P)  -  min  {IP^C^  +  eVg(TP);  ZP^I^  +  BEP^V^Ce^); 

IP^R^  +  6Vg(e°)}  PeS 

and  any  rule  R  which  when  In  state  P  selects  an  action  which  minimizes 

P 

the  right  side  of  (1)  Is  6-optimal  -  i.e.  t/i(P,6,Ro)  ■  V.  (P)  for  all  PeS 

P  P 

Definitions  The  B-ortimal  produce  region  =  {PsV„(P)  ■  EP.C.  +  6V  (TP) } 

P  11  P 

The  B-optlmal  inspect  region  =  *{PsVg(P)  ■  +  BT  P^Vg(e^)} 

The  6-optlmal  revise  region  =  {PsVg(P)  ■  ^P^Rj^  +  BVg(e^)} 
Lemma  2.1;  V  (P)  is  a  concave  function  of  P  -  i.e.  if  P  ■  XP^  +  (l-X)P^ 

'  P 

Then  Vg(P)  ^Wg(P^)  +  (l-X)Vg(P^). 

Proof;  Let  vJ(P)  -  min  (EP^C^,  EP^^R^} 

V"(P)  -  min  {EP^C^  +  6Vg"^(TP);  EP^I^  +  BEP^Vg"^ (e^) ; 

EP^R^  +  SV""^(e°)} 

Then  Vg(P)  being  the  minimum  of  three  concave  functions  is  concave. 

Assuming  that  Vg  ^(P)  Is  concave  we  get  that  Vg(P)  is  concave  for  the 
same  reason  since  T(AP  +  (l-X)P^)  ■  XT(P  )  +  (1-X)  T(P^).  Thus,  by 
Induction,  Vg(P)  Is  concave  for  all  n.  But  Vg(P)  Is  just  the  minimal 
expected  costs  Incurred  over  n  stages  and  thus  Vg(P)  -►  V„(P). 
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Theorem  2.2;  Both  the  B-optlmal  Inspect  and  revise  regions  are  convex. 
Proof:  Suppose  V.(P^)  -  Ep}c,  +  SEpJv.(e^) 

P  11  1  P 

and  Vg(P^)  -  ZP^C^  +  Bi:P^Vg(e^) 

and  let  P  -  XP^  +  (l-X)P^.  Then 

Vg(P)  iVg(P^)  +  (l-X)Vg(P^)  -  EP^C^+  6EP^Vg(e^) 

but  by  (1)  we  get  the  reverse  Inequality.  The  same  method 
works  for  the  revise  region. 

QED. 

Often  one  is  Interested  in  an  optimality  criterion  which  does  not 

discount  future  costs.  Such  a  criterion  is  the  long-run  expected  average 

cost  per  unit  time.  So  for  any  policy  R  define 

n  0 

^(P,R)  ■  11m  sup  E  E_[C(X  )|Xq  ■  P]/n.  Fix  some  P^cS  and  let 

n-^«  t-0 

fg(P)  ■  Vg(P)  -  Vg(P^).  The  following  theorem  was  proven  by  Ross  in  [1]. 

Theorem:  If  {fg(P):  PeS,6e(0,l)}  is  a  uniformly  bounded  equlcontlnuous 
family  of  functions  then 

(a)  There  exists  a  bounded  function  f(P)  and  a  constant  g  such  that 
(2)  g  +  f(P)  -  min  (SP^C^  +  f(TP);  EP^I^  +  EPj^f(e^);  EP^R^  +  f(e°)}  PeS 

(b)  g  ■  lim  (1-S)V„(P)  for  all  PeS;  and  for  some  sequence  S  -*•1 

6-^1  ® 

f(P)  -  lim  f  (P) 

r-vo. 

(c)  If  R*  is  any  rule  which  when  in  state  P  selects  an  action  which 
minimizes  the  right  side  of  (2)  then 


g  -  <(>(P,R*)  -  min  (t»(P»R)  for  all  PeS 
R 


From  (b)  of  the  above  and  Lemma  2.1  we  thus  have 

Lemma  2.1*;  If  {fg(P)}  Is  uniformly  bounded  and  equlcontlnuous  then 
f(P)  Is  a  concave  function  of  P. 

Theorem  2.2*!  If  {f  (P)}  Is  uniformly  bounded  and  equlcontlnuous  then 

P 

both  the  average-cost  optimal  Inspect  and  revise  regions  are  convex. 
Proofs  Same  as  Proof  of  Theorem  2.2,  (The  average-cost  regions  are 
defined  by  using  equation  (2)  in  the  same  manner  as  equation  (1)  was 
used  In  the  $-dlscount  case) . 

3.  A  Two-State  Production  Process 

In  this  section  we  shall  suppose  that  there  are  two  underlying 
states  -  0  (the  good  state)  and  1  (the  bad  state).  If  the  process  Is 

In  the  good  state  at  time  t  and.  If  the  process  Isn't  revised,  then  with 
probability  tt  It  will  be  In  the  bad  state  at  time  t+1  where  it  will 
remain  until  it  is  revised  -  i.e,  Pqq  ■  1  -  it,  ■  1. 

The  cost  of  producing  without  Inspection  will  be  taken  to  be  zero 
for  the  good  state  (Cq  ■  0)  and  C  for  the  bad  state  (C^^  ■  C) .  The 
Inspect  cost  I  and  the  revise  cost  R  will  be  assumed  not  to  depend  on 
the  underlying  state  -  I.e.  Iq  ■  Ij^  ■  I,  Rq  ■  Rj^  ■  R.  It  shall  be 
assumed  throughout  that  C  <  I  <  R  (this  conditions  Is  natural  since  the 
Inspect  cost  Is  supposed  to  Include  some  cost  due  to  production) . 


6 


We  will  only  consider  the  dlscounted-cost  case  and  will  be 
concerned  with  determining  the  structure  of  the  optimal  policy  rather 
than  with  computational  algorithms. 

Since  there  are  only  two  states  we  may  let  S  ■  {P:Pe[0,l]};  and 
we  say  that  -  P  If  P  Is  the  probability  that  at  the  beginning  of 
period  t  the  underlying  process  Is  In  the  bad  state.  Also  In  this 
specialization  of  our  general  model  we  have  that  TP  -  P  -f  it  -  irp 
and 

(3)  Vg(P)  -  min  {CP  +  eVg(TF;;  I  +  0PVg(l)  +  e(l-P)Vg(TT) ;  R  +  0Vg(7r)} 

Pe:[0,1]. 

Lemma  3.1:  Vg(P)  Is  monotone  non-decreasing  In  P. 

Proof ;  Let 

Vg(P)  -  min  {CP,I,R}  and  recursively 

(A) 

Vg(P)  -  min  {CP+6Vg"^(TP);  I  +  6PVg"^(l)  +  6(l-P)Vg‘^(TT) ;  R  +  SVg"^(Tr)} 

Then  It  is  easily  seen  by  Induction  that  Vg(P)  Is  monotone  for  all  n  and 
thus  that  Vg(P)  Is  monotone. 

QED. 

Lemma  3.2;  Every  S-optlmal  policy  produces  at  all  P  such  that  0  ^  P  it. 
Proof !  Suppose  some  optimal  policy  Inspects  at  Pe[0,7T].  Then 

Vg(P)  -  I  +  gPVgd)  +  (I-P)V(tt)  ^  I  +  6Vg(Ti)  2  I  +  BVg(P) 
by  monotonlclty .  Thus  Vg(P)  _>  1/1“ 3. 
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However  by  (3)  Vg(')  ^  C  +  6Vg(l)  and  thus  C/1-3 

which  Is  a  contradiction.  A  similar  kind  of  contradiction  Is  arrived  at 
If  an  optimal  policy  revises  at  Pe:[0,it]. 

QED. 

Theorem  3.3;  An  optimal  policy  Rg  may  be  determined  by  three  numbers 

^l’^2’^2  ^  —  ^2  —  **3  —  ^  ^3  0  _<  P  <  Pj^, 

Inspects  for  Pj^  ^  P  <  '^2*  produces  for  P2  £  P  ^  ^3  revises  for  P  2.  ^3* 
Proof ;  By  Lemma  3.1  and  (3)  It  follows  that  the  3-optlmal  revise 

region  may  be  taken  to  be  a  right-hand  Interval.  The  result  then  follows 
from  Lemma  3.2  and  Theorem  2.2. 

QED. 

Thus  the  3-optlmal  policy  may  be  described  graphically  as  follows: 

Produce  Produce 

without  without 

Inspection  Inspect  Inspection  Revise 


It  Is  however  somewhat  counter-intuitive  to  have  two  disjoint  produce 
regions.  Intuitively  It  would  seem  likely  that  the  second  produce 
region  could  always  be  taken  to  be  vacuous.  That  this  Is  not  so,  and 
thus  that  sometimes  four  distinct  regions  are  necessary  to  characterize 
the  3-optlmal  policy,  Is  shown  by  the  following  example. 
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Example;  C"4,  ’f*  .1,  R*10,  I-6. 

Then  by  letting  B  ■  1  and  using  (4)  we  can  show  that 


vJ(P) 


20.87  P  +  7.13  P  ^  .471 

8.13  P  +  13.13  .471  ^  P  <  .493 

17.13  P  >  .493 


and  thus 


vjcp) 


min  (22.78  P  +  9.22;  7.91  P  +  15.22;  19.22} 

«  min  (11.31  P  +  13.94;  7.91  P  +  15.22;  19.22} 
iP.ln  (4  P  +  17.13;  7.91  P  +  15.22;  19.22} 


for  P  <  .412 
for  .412  ^  P  < 
for  P  >  .437 


.437 


thus 

P  <  .404  (P) 

.404  ^  P  <  .489  (1) 

.489  ^  P  <  .521  (P) 

P  ^  .521  (R) 

Thus  for  8  near  1  the  B-optlmal  eight-stage  policy  starts  off  by  producing 
for  Pe[0,.404),  Inspecting  for  Pe[.404  .489),  producing  again  for  Pe[.489,  .521), 
and  revising  for  P  ^  .521.  Thus  we  see  that  four  distinct  action  regions 
mlr  t  be  necessary.  The  next  theorems  give  sufficient  conditions  for  the 
optimal  policy  to  have  a  simpler  form  than  the  general  one  givey  by  Theorem  3.3. 
For  n  i  1  let  T^h*  -  T(t"’^P)  where  T°P  =  P,  then 


vj(p) 


22.78  P  +  9.22 
7.91  P  +  15.22 
4P  +  17.13 
19.22 


T^P  -  ir  +  (l-Tr)P 

T^P  -  TT  +  (l-Tr)Tr  +  (l-rT)^P 

t"p  -  Ti  +  (l-Tr)Tr  +  ...  +  (l-^)"-!,;  +  (l-,r)np 
-  1  -  (1-P)(1-tt)" 
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Let  R  be  the  policy  that  always  produces  without  Inspection  (always 
takes  action  P^) . 

00  00 

Then  iJ;(6,P,R°)  -  I  6"ct”p  -  C  E  b"  (1  -  (1-P)(1-it)") 


n*0 


(5) 


1-6 


n*0 

C(l-P) 

1  -  B(1-it) 


0  c 

Theorem  3.4;  (a)  R  Is  6-optlmal  If  and  only  If  R  ^ 

Q 

(b)  If  R  <  j^_g^Y-TT)  every  6-optlmal  policy  revises 

for  P  near  1. 


Proof ;  I."  R  _>  then  it  can  be  checked  by  direct  substitution 

that  ij/(P,6tR^)  satisfies  (3)  and  thus  R^  Is  optimal.  If  R^  Is  optimal 
then  by  (3)  we  have  that 

i|/(l,6|R^)  £  R  +  6'^(Tt,6,R^)  which  Implies  by  (5)  that 


<  R  + 


BjtC 


1-6  -  “  '  (1-6)(1-6(1-tt)) 


or 


R  > 


-  1-6(1-tt) 


To  prove  (b)  we  note  by  (3)  that  If  an  optimal  policy  doesn't  revise 

for  P  -  1,  then  V^d)  -  3^  1  R  +  6Vg(7T)  <  R  +  6iJ;(7r,6,R°)  -  R  +  (i.g)  (i^gd-Tr)) 

Q 

which  Implies  that  R  ^  .  The  result  follows  for  all  P  near  1  by  the 

continuity  of  Vg(P).  (The  continuity  of  Vg(P)  Is  proven  In  the  next  Lemma). 

QED. 
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The  following  Lemma  will  be  needed  In  the  sequel. 


Lemma  3,5;  |Vg(P,)  -  vJ(P-)|  £  cjp  -  P  |  ^  ~ 

^  1-  6(1-71) 


all  P2^»P2»  n* 

Proof !  The  proof  Is  by  Induction;  the  result  Is  trivial  for  n  ■  1. 
So  assume  It  for  n  -  1.  There  are  now  three  cases: 

(1)  Vg(Pj^)  -  CPj^  +  6Vg"^(TPj^)  which  Implies  by  (3)  that 

VJ(P2)  -  Vg(P^)  1  C|P2  -  P^k  6[V^‘^(TP2)  -  vJ"^(TP^)] 

n-1 


£  c|p,  -  pj+  ecd-Ti)  |p,  -  p, I 

^  ^  1  -  6(1-77) 

,  c|p,  ■  p  I 

1  -  6(1-77) 


(11)  V“(P^) 


I  +  6Pj^  Vg“^(l)  +  6(1-Pj^)  v""^(77)  which  Implies  by  (3)  that 


VJ(P2)  -  V^(P^)  <  6|P^  -  P2I  IV^'^(l)  -  V^'^77)] 


^  6|P,  -  Pjl  C(1-7t)  -■  - 

1  -  6(1-77) 


n-1 


<c|p, -pj 

I  -  6(l-») 


,n-l 


(111)  Vg(Pj^)  •  R  +  6Vg  (77)  which  Implies  that 


The  result  then  follows  by  Interchanging  P^  and  P2. 

QED 
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The  following  corollary  is  Immediate 


C|P  -P  I  C|P-P 

Corollary  3.6;  |  V.  (P,  )  -  V.  (P,)  |  <  - ^  ^ 

®  ^  ®  ^  "  1  -  e(i-TT)  " 


Theorem  3.7  (Sufficient  Conditions); 

(l-6)(R+6Vg(ir))  R-I 

(a)  "  — —  is  a  sufficient  condition 

C  e(Vg(l)  -  Vg(Ti)) 

for  the  existence  of  a  6-optimal  policy  which  produces  for  P  <  P^, 

Inspects  for  i.  P  <  P2  revises  for  P  ^  P2  for  some  it  1  Pj^  P2  i. 

C  R-I  R 

(b)  I  +  e(V.(l)  -  V  (tt))  ^  -  or  -  (1-3(1-tt)) 

^  1  -  B(1-tt)  B(Vg(l)  -  Vg(TT))  C 

is  a  sufficient  condition  for  the  existence  of  a  3-optlmal  policy  which 
produces  for  P  <  Pj^  and  revises  for  P  ^  P^  for  some  Pj^  2.  -  i.e.  no 

inspection  region. 

Proof;  (a)  Let  Pj^  and  ?2  be  such  that  CPj^  +  BVg(TPj^)  -  R  +  BVg(Tf)  and 
I  +  BP2Vg(l)  +  B(1-P2)  ■  R  +  SVg(n).  If  such  a  P^^  doesn't  exist 

then  let  it  be  infinite,  i  ■  1,2.  Then  using  the  fact  that  CP  +  BV„(TP) 

B 

is  monotone  and  concave  It  follows  that  a  necessary  condition  for  every 
B-optimal  policy  to  have  four  distinct  action  Intervals  is  for  P2^  >  P2. 

(See  figure  1). 
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Figure  1 


I  +  epvg(i)  +  6(i-p)Vg(TT) 

+  SVg(TP) 

-  R  +  6Vg(TT) 


But  if  >  P2  then  Vg(Pj^)  -  R  +  6Vg(TT)  and  thus  by  monotonicity 
Vg(TPj^)  -  R  +  eVg(Tr).  Thus  CPj^  +  S(R+BVg(TT))  -  R  +  eVg(TT)  or 


(1-B)(R+6V-(tt)) 


.  Thus  Pj^  >  P2  Implies  that 


(R+6V.(ir)) 

(1.0)  - g -  > 


R  -  I 


6(Vg(l)  -  Vg(TT)) 


and  thus  (a)  Is  proven. 


(b)  In  order  for  every  6-optimal  policy  to  inspect  at  P  we  must  have 


(6)  I  +  BPVgd)  +  B(l«P)Vg(Tr)  <  R  +  BVg(Ti)  and 

(7)  I  +  gPV^d)  +  Bd-P)V^(Tr)  <  CP  +  BV„(TP). 


Now  (6)  implies  that  P  <  ■  —  — .  From  (7)  and 

6(Vg(l)  -  Vg(TT)) 

Corollary  3.6  we  get 

I  +  ePVfld)  +  6(l-P)V„(Tr)  <  CP  +  $[V,(tt)  +  - - -  (1-tt)P] 

P  P  P  1  _£>  /I  -\ 


which  Implies  that 


P 


I 


C 

l-6(l-7r) 


-  6(Vg(l)  -  V^(7t)) 


Thus  we  would  need  both  that 


R  -  I 

6(Vg(l)  -  Vg(7r)) 


> 

C 

i-e(i-iT) 


I _ 

-  e(Vg(l)  -  Vg(n)) 


and 


C 

1-6(1-it) 


I _ 

-  e(Vg(i)  -  Vg(7T)) 


<  1. 


Thus  if  either  of  the  above 


inequality  doesn’t  hold  then  there  exists  a  B-optimal  policy  which  never 
Inspects.  It  is  easy  to  see  that  it  can  be  taken  to  have  the  desired 
form. 

QED 


The  conditions  given  in  Theorem  3.7  unfortunately  depend  on  Vg(l) 
and  V-(it).  However,  we  can  prove  the  following: 

P 

r 

Corollary  3.8;  If  R  <  -  then  either 

l-ed-TT) 

I  +  (R-C)  ^  - - -  or  d-6d-TT)) 

l-Bd-TT)  6  (R-C)  C 

is  a  sufficient  condition  for  a  B-optlmal  policy  which  produces  for 
P  <  Pj^  and  revises  for  P  2.  ^1* 
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Proof : 


R  <  - implies  by  Theorem  3.4  that  V*(l)  -  R  +  VqCtt). 

l-Bd-TT)  ®  ^ 

Thus  V. (1)  -  Vfl(7i)  -  R  +  (g-1)  V-(tt)  >  R  -  C.  The  result  follows  from 

p  p  p  — 

Theorem  3.7. 

QED 

C  0 

Note  that  if  R  ^  -  then  R  ,  the  policy  which  always  produces 

l-6(l-7r) 

(without  inspection)  is  optimal. 

4.  Unknown  tr 

We  have  assumed  up  to  this  point  that  all  the  parameters  of  the  model  - 
C,I,R  and  n-  are  known.  However,  while  the  cost  parameters  would  probably 
be  known  it  is  quite  likely  that  it  will  not  be  known  with  certainty.  We 
shall  now  give  a  method  for  estimating  it  from  past  records  of  the  process; 
we  also  show  what  to  do  if  an  apriorl  distribution  for  tt  Is  known. 

Estimation  of  tt 

We  shall  suppose  that  the  past  records  for  the  process  yield  the 
following  sort  of  data:  (nj^.Zj^)  , . . .  (n^jZ^)  where  n^  denotes  the  number  of 
periods  succeeding  the  time  at  which  the  process  was  known  to  be  in  the 
good  state  (either  by  a  revision  or  by  an  inspection  showing  it  to  be  good) 
until  it  was  next  inspected,  and  Z^  is  1(0)  if  the  inspection  showed  the 
process  to  be  good  (bad) . 

Then  P{Z^  *1}  ■  1  -  P{Z^  ■  0}  ■  (1-Ti)”i,  and  so  the  probability 

density  of  Z^  is  given  by  ■  (1-tt)”^^1  (1-(1-Tr)’^i)^”^l  -  0,1 

and  the  Joint  likelihood  of  all  the  Z^'s  is  given  by 
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L(Z^,...Z^)  -  (1 


1-1 


log  L(Z,,...Z  )  -  En.Z  log  (l-n)  +  E  (1-Z  )  log  (1-(1-Tr)"i) 
^  1  ^  ^  i-1 


Ti  Z  r 

~  log  L(Z  ,...,Z  )  -  -  +  E  (1-Z.)  n,(l-Ti)"i"^ll-(l-TT)"i]'^ 

1  1  1-^  1-1  ^  ^ 


and  80  the  maximum  likelihood  it  Is  given  by 


1,  if  Z^  -  0  for  all  1 
0,  If  Z^  -  1  for  all  1 


the  solution  to  -  -  E  (1-Z. )nj (1-tt)  ^  [1-(1-tt)  , 

1  ^  ill  ji  1^ 

1-n  1  1-1 


otherwise 


Special  Case:  if  -  n  for  all  1  -  l,...,r 


then 


•VT. 


/r 


(b)  Prior  Distribution  for  ^ 

We  suppose  that  we  have  an  aprlori  density  gQ  -  l.e. 

X 

P{tt  x}  ■  /  gn(y)dy  0  ^  X  £  1  -  and  that  we  are  Interested  In  minimizing 
0 

the  expected  g -discounted  costs. 

We  shall  say  that  the  system  Is  in  state  (P(7T),g)  at  time  t  -  l.e. 

-  (P(TT),g)  -  if  P(tt)  denotes  the  probability  (possibly  as  a  function  of 
the  unknown  tt)  that  the  process  Is  In  the  bad  state  at  time  t,  and  If  g  Is 
the  posterior  (given  everything  that  has  happened  up  to  time  t)  density  of  u . 


6 


For  t  ■  0,1,...  let  ■  (P^(TT),g^).  We  shall  assume  that  P^(tt) 
la  either  of  the  form  P^(Tr)  ■  P  or  P^(ii)  ■  (1-tt)P  +  it  where  P  is  some 
number  In  [0,1].  Thus  P^(n)  Is  monotone  non-decreasing  in  tt  and  from 
this  it  follows  that  P^Ctt)  will  be  monotone  non-decreasing  in  ir.  This 
la  ao  because  is  either  it  or  1,  or  (1-tt)P*‘(tt)  +  tt.  We  can  thus 

a  probability  density  on  [0,1], 
let  the  state  space  S  ■  ^  0  _<  P(tt)  £  1  for  all  ire [0,1],  P(tt)  is  monotone 

non-decreasing  In  tt. 


f(P(it),g):g  is 
-<  0  £  P(tt)  1  1  fc 


Letting  V.(P  (Tr),g)  denote  the  expected  3-dlscounted  cost  Incurred 
P 

over  an  infinite  time  span  given  that  the  process  starts  In  state 
(P(Tr),g)  and  an  optimal  policy  is  employed,  we  have  that 


^CE  P(tt)  +  BVg(TP(TT),g) 

O 


0 


(8)  Vg(P(Tr),g) 


where  TP(Tr) 


min 


I  +  6EgP(Tr)Vg(l,gJ^^j)  +  B(l-EgP(Tr))Vg(Tr,g2^^^: 


EgP(TT)  - 


R  +  6Vg(Tr,g) 

(I-TT)  P(ti)  +  TT 
1 

;  P(Tr)  g(TT)  dTT 
0 


P(x)  g(x) 

”i 

;  p(x)g(x)dx 
0 


P(x)  g(x) 


E^P(tt) 


*P(tt) 


(x) 


(1-P(x))g(x) 


/  (1-P(x))g(x)dx 
0 


(1-P(x))g(x) 


1  -  E  P(tt) 
g 


where  by  P(x)  we  mean  PCtt)  evaluated  at  tt  ■  x. 
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define 

{CE  P(tt);  I,  R}  -  CE  P(7r) 

S  8 

'cEgP(TT)  +  8Vg(TP(TT),g) 

<  I  +  6EgP(-,)  v"(l,gj(,))  +  6(l-EjP(..))V^(.,gJ(,) 
R  +  3V"(Tr,g) 

Thus  the  finite  stage  problem  may  be  solved  recursively;  and 
Vg(P(Tr),g)  -►  Vg(P(7T),g)  as  n  -*■  ». 

In  this  paper  we  have  only  considered  the  case  that  the  true  state 

of  the  production  process  Is  observable  upon  Inspection  of  the  Item 

produced.  However  often  one  would  not  learn  the  true  state  upon 

Inspection  but  would  rather  get  some  additional  (not  necessarily 

exhaustive)  information  about  the  true  state.  The  first  paper  dealing 

with  this  latter  model  was  that  of  Girshick  and  Rubin  [3].  They  however 

Incorrectly  stated  that  the  average  cost  optimal  policy  may  be  character- 

P  I.  R 

ized  by  three  action  regions  q— - - - y.  The  first  counter¬ 

example  showing  the  Girshlck-Rubin  solution  to  be  In  error  was  given  by 
Taylor  [6].  Tafeen  [5]  has  recently  treated  a  similar  model  and  has 
shown  that  under  some  restrictions  on  the  Information  pattern  and  state 
space  the  optimal  policy  may  be  characterized  by  three  regions.  However 
his  result  doesn't  hold  If  the  state  space  Is  allowed  to  be  the  whole 
Interval  10,1].  Future  research  on  the  general  Glrshlck-Rubln  model 
(under  both  an  average  and  discounted  cost  criterion)  is  thus  needed. 

It  would  for  example  be  Interesting  to  know  If  the  optimal  policy  may 
be  characterized  by  four  regions  as  In  the  present  paper. 


As  before  we  may  also 
Vg(P(Tt),g)  -  min 

(9)  Vg'*’^(P(r^,g)  -  min 
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